Potential Outcomes Framework
(and More)

PSCI 8357 - STAT II

Georgiy Syunyaev

Department of Political Science, Vanderbilt University

January 21, 2026

Potential Outcomes Framework

What is Causal Inference?


  • Causal inference = inference about counterfactuals
  • Examples:

    • Incumbency advantage:

      What would have been the election outcome if the candidate had not been an incumbent?

    • Democratic peace:

      Would the two countries have fought each other if they had been both autocratic?

    • Policy intervention:

      How many more disadvantaged youths would get employed under the new job training program?

  • Problem: We need a statistical framework that can explicitly distinguish factuals and counterfactuals.

Potential Outcomes Framework to the Rescue



 

Jerzy Neyman (1894–1981)

 

Donald Rubin (1943–)

 

Neyman Urn Model




Neyman Urn Model

More Formally

DEFINITION: Treatment

\(T_i\): Indicator of treatment intake for unit \(i\), where \(i = 1, ..., N\)

\[ T_i = \begin{cases} 1 & \text{if unit } i \text{ received the treatment} \\ 0 & \text{otherwise} \end{cases} \]

DEFINITION: Observed Outcome

\(Y_i\): Variable of interest whose value may be affected by the treatment

DEFINITION: Potential Outcome

\(Y_{i} (t)\): Value of the outcome that would be realized if unit \(i\) received the treatment \(t\), where \(t \in \{ 0, 1\}\)

\[ Y_{i} (t) = \begin{cases} Y_{i} (1) & \text{Potential outcome for unit } i \text{ under treatment} \\ Y_{i} (0) & \text{Potential outcome for unit } i \text{ under no treatment} \end{cases} \]

Causal Effects with Potential Outcomes

DEFINITION: Unit Treatment Effect

Causal effect of the treatment on the outcome for unit \(i\) is the difference between its two potential outcomes:

\[ \tau_i = Y_{i} (1) - Y_{i} (0) \]

  • What we observe is just the realization of potential outcomes:

\[ Y_i = \begin{cases} Y_{i} (1) & \text{if } T_i=1 \\ Y_{i} (0) & \text{if } T_i=0 \end{cases} \]

  • Hence observed outcomes can be given by switching equation: \(Y_i = Y_i ({T_i}) = T_i Y_{i} (1) + (1-T_i) Y_{i} (0)\)
  • Fundamental Problem of Causal Inference (Holland 1986):

    • We can never observe both \(Y_{i}(1)\) and \(Y_{i}(0)\) for the same \(i\)
    • This makes \(\tau_i\) unidentifiable without further assumptions.

Causal Inference as a Missing Data Problem

  • Causal effect (or treatment effect) for unit \(i\) is \[ \tau_i = Y_{i}(1) - Y_{i}(0) \]
  • For treated unit \(i\) with \(T_i = 1\) we observe \(Y_i(1)\), so \[ \tau_i = \underbrace{Y_{i}}_{\text{observed}} - \underbrace{Y_{i}(0)}_{\text{unobserved}} \]
  • Intuition: We want to “impute” counterfactual outcome \(Y_i(0)\) for treated units
  • The opposite is true for control unit \(i\) with \(T_i = 0\)
  • Without assumptions, it is in general impossible to learn about causal effects \(\rightarrow\) we can think that causal inference helps:

    • Develop designs and clarify reasonable and interpretable assumptions that we need to make to infer about counterfactual outcomes.

Causal Inference as a Missing Data Problem


  • Problem: We only observe one of the potential outcomes, so how can we learn about \(\tau_i = Y_{i}(1) - Y_{i}(0)\)?
  • One “heroic solution” is to assume unit homogeneity

    • If \(Y_{i} (1)\) and \(Y_{i} (0)\) are constant across individual units, then cross-sectional comparisons will recover \(\tau = \tau_i\)

    • If \(Y_{i} (1)\) and \(Y_{i} (0)\) are constant across time, then before-and-after comparisons will recover \(\tau = \tau_i\)

  • This may be sometimes plausible in physics or chemistry, but is almost never true in social sciences.
  • Our best hope is to try to recover some aggregated quantities of interest, or causal estimands

Causal Estimands

Back to the Neyman Urn Model

Causal Quantities of Interest, or Causal Estimands


  • Unit-level causal effects are fundamentally unobservable \(\rightarrow\) focus on averages in most situations.

DEFINITION: Average Treatment Effect (ATE)

\[ \begin{align*} \tau_{ATE} &= \frac{1}{N}\sum_{i=1}^N \left\{Y_{i}(1) - Y_{i}(0) \right\} &&\textit{(finite-population)}\\ \tau_{ATE} &= {\mathbb{E}}[Y_{i}(1) - Y_{i}(0)] &&\textit{(super-population)} \end{align*} \]

  • Example: The average effect of a GOTV mail on the voter turnout.

  • Note: that \(\tau_{ATE}\) is still unidentified

  • In the rest of this course, we will consider various assumptions under which \(\tau_{ATE}\) can be identified from observed information

Causal Quantities of Interest, or Causal Estimands


DEFINITION: Average Treatment Effect on the Treated (ATT)

Let \(N_1 \equiv \sum_{i=1}^N T_i\), then

\[ \begin{align*} \tau_{ATT} &= \frac{1}{N_1}\sum_{i=1}^N T_i\left\{Y_{i} (1) - Y_{i} (0) \right\} &&\textit{(finite-population)}\\ \tau_{ATT} &= {\mathbb{E}}[Y_{i}(1) - Y_{i}(0) \mid T_i = 1] &&\textit{(super-population)} \end{align*} \]

  • Example: The average effect among people who received the mail.
  • Exercise: Define ATE on the untreated (control) unit, \(\tau_{ATC}\).

DEFINITION: Conditional Average Treatment Effect (CATE)

\[ \tau_{CATE}(x) = {\mathbb{E}}[ Y_{i}(1) - Y_{i}(0) {\:\vert\:}X_i = x] \]

  • Example: The average effect of a GOTV mail on the voter turnout among females.

Illustration: Average Treatment Effect

  • Suppose we observe a population of 4 units
\(i\) \(T_i\) \(Y_i\) \(Y_{i}(1)\) \(Y_{i}(0)\) \(\tau_i\)
1 1 3 3 ? ?
2 1 1 1 ? ?
3 0 0 ? 0 ?
4 0 1 ? 1 ?


  • What is our best guess about \(\tau_{ATE}={\mathbb{E}}[Y_{i}(1) - Y_{i}(0)]\)?

Illustration: Average Treatment Effect

  • Let us try to calculate our best guess
\(i\) \(T_i\) \(Y_i\) \(Y_{i}(1)\) \(Y_{i}(0)\) \(\tau_i\)
1 1 3 3 ? ?
2 1 1 1 ? ?
3 0 0 ? 0 ?
4 0 1 ? 1 ?
\({\mathbb{E}}[Y_{i}(1) {\:\vert\:}T_i = 1]\) 2
\({\mathbb{E}}[Y_{i}(0) {\:\vert\:}T_i = 0]\) 0.5


  • Observed difference in means is \(\hat\tau_{DiM} = {\mathbb{E}}[Y_{i}(1) {\:\vert\:}T_i = 1] - {\mathbb{E}}[Y_{i}(0) {\:\vert\:}T_i = 0] = 1.5\).
  • Could this be wrong? Knowing \(\tau_{ATE}={\mathbb{E}}[Y_{i} (1) - Y_{i} (0)]\) would help.
  • We need potential outcomes that we do not observe!

Illustration: Average Treatment Effect

  • Suppose hypothetically: \(Y_{1}(0) = 0, Y_{2}(0) = Y_{3}(1) = Y_{4}(1) = 1\)
\(i\) \(T_i\) \(Y_i\) \(Y_{i}(1)\) \(Y_{i}(0)\) \(\tau_i\)
1 1 3 3 0 3
2 1 1 1 1 0
3 0 0 1 0 1
4 0 1 1 1 0
\({\mathbb{E}}[Y_{i}(1)]\) 1.5
\({\mathbb{E}}[Y_{i}(0)]\) 0.5


  • What is \(ATE\)? \(\tau_{ATE} = {\mathbb{E}}[Y_{i}(1)-Y_{i}(0)] = {\mathbb{E}}[\tau_i] = \frac{3 + 0 + 1 + 0}{4} = 1\).
  • Why is \(\tau_{ATE} \neq \hat\tau_{DiM}\)? When would they be equal?

Illustration: Average Treatment Effect on the Treated

  • Let’s look at the other estimand?
\(i\) \(T_i\) \(Y_i\) \(Y_{i}(1)\) \(Y_{i}(0)\) \(\tau_i\)
1 1 3 3 0 3
2 1 1 1 1 0
3 0 0 1 0 1
4 0 1 1 1 0
\({\mathbb{E}}[Y_{i}(1) {\:\vert\:}T_i = 1]\) 2
\({\mathbb{E}}[Y_{i}(0) {\:\vert\:}T_i = 1]\) 0.5


  • What is \(ATT\)? \(\tau_{ATT} = {\mathbb{E}}[Y_{i}(1)-Y_{i}(0) {\:\vert\:}T_i = 1] = {\mathbb{E}}[\tau_i {\:\vert\:}T_{i}] = \frac{3 + 0}{2} = 1.5\).
  • \(\tau_{ATT} = \hat\tau_{DiM}\), but would that always be the case?
  • Why is \(\tau_{ATT} \neq \tau_{ATE}\)? When would they be equal?

SUTVA?

Stable Unit Treatment Value Assumption (SUTVA)


  • Recall that we define potential outcomes as \(Y_{i}(1)\) and \(Y_{i}(0)\)
  • This implicitly makes an important assumption: potential outcomes for unit \(i\) are stable wrt others (1) and your own (2) treatment assignment

ASSUMPTION: SUTVA

\[ Y_{i} (\mathbf{t}) = Y_{i} (\mathbf{t^{\prime}}) \quad \text{if } t_{i} = t_{i}^{\prime} \]

  • SUTVA consists of two sub-assumptions:

    1. No interference: Potential outcomes for a unit must not be affected by treatment for any other units. Violations: spillover effects, contagion, dilution, displacement, communication

    2. Consistency: Nominally identical treatments are in fact identical. Violations: Variable levels of treatment, technical errors, fertilizer on plot yield

Causal Inference without SUTVA

  • Let \(\mathbf{T}= (T_1,T_2)\) be a vector of binary treatments for \(N = 2\).
  • How many different values can \(\mathbf{T}\) possibly take? \(\textcolor{#8ec07c}{(0,0)},\, \textcolor{#928374}{(1,0)},\, \textcolor{#d65d0e}{(0,1)},\, \textcolor{#b16286}{(1,1)}\)
  • How many potential outcomes unit \(1\) has? \(Y_{1}(\textcolor{#8ec07c}{(0,0)}),\, Y_{1}(\textcolor{#928374}{(1,0)}),\, Y_{1}(\textcolor{#d65d0e}{(0,1)}),\, Y_{1}(\textcolor{#b16286}{(1,1)})\)
  • How many causal effects for unit \(1\)?

\[ \begin{array}{cc} Y_{1}(\textcolor{#b16286}{(1,1)}) - Y_{1}(\textcolor{#8ec07c}{(0,0)}), &Y_{1}(\textcolor{#b16286}{(1,1)}) - Y_{1}(\textcolor{#d65d0e}{(0,1)}), \\ Y_{1}(\textcolor{#b16286}{(1,1)}) - Y_{1}(\textcolor{#928374}{(1,0)}), &Y_{1}(\textcolor{#928374}{(1,0)}) - Y_{1}(\textcolor{#8ec07c}{(0,0)}),\\ Y_{1}(\textcolor{#d65d0e}{(0,1)}) - Y_{1}(\textcolor{#8ec07c}{(0,0)}). &\\ \end{array} \]

  • How many observed outcomes for unit \(i\)? Only one, \(Y_i = Y_{i} ( (T_1, T_2) )\)
  • Without SUTVA, causal inference becomes EXPONENTIALLY more difficult as \(N\) increases (formally we have \(2^N\) potential outcomes).

Selection Bias

Selection Bias

  • Comparisons of observed outcomes do not usually give the right answer

\[ \begin{align*} \hat\tau &= {\mathbb{E}}[Y_i {\:\vert\:}T_i=1]-{\mathbb{E}}[Y_i {\:\vert\:}T_i=0] &&\\ &\class{fragment}{{}= {\mathbb{E}}[Y_{i} (1) {\:\vert\:}T_i=1]-{\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=0] \quad \text{($\because$ switching equation)}}\\ &\class{fragment}{{}= \underbrace{{\mathbb{E}}[Y_{i} (1) - Y_{i} (0) {\:\vert\:}T_i=1]}_{\tau_{ATT}} + \underbrace{{\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=1]-{\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=0]}_{\text{Selection bias}} \quad \text{($\because \pm {\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=1]$)}} \end{align*} \]

Selection Bias

  • Comparisons of observed outcomes do not usually give the right answer

\[ \begin{align*} \hat\tau &= {\mathbb{E}}[Y_i {\:\vert\:}T_i=1]-{\mathbb{E}}[Y_i {\:\vert\:}T_i=0] &&\\ &= {\mathbb{E}}[Y_{i} (1) {\:\vert\:}T_i=1]-{\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=0] \quad \text{($\because$ switching equation)}\\ &= \underbrace{{\mathbb{E}}[Y_{i} (1) - Y_{i} (0) {\:\vert\:}T_i=1]}_{\tau_{ATT}} + \underbrace{{\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=1]-{\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=0]}_{\text{Selection bias}} \quad \text{($\because \pm {\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=1]$)} \end{align*} \]

  • Bias term \(\neq 0\) if selection into treatment is associated with potential outcomes.
  • Example: Church attendance and turnout

    • Churchgoers differ from individuals who do not attend church in many ways (e.g., civic duty).
    • Turnout for churchgoers would be higher than for non-churchgoers even if churchgoers never attended church. \(\rightarrow\) \({\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=1] - {\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=0] > 0\)
  • Example: Job training program for the disadvantaged

    • Participants are self-selected from a subpopulation of individuals in difficult labor situations.
    • Post-training period earnings for participants would be lower than those for nonparticipants in the absence of the program. \(\rightarrow\) \({\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=1] - {\mathbb{E}}[Y_{i} (0) {\:\vert\:}T_i=0] < 0\)

Other Decompositions

  • Can we decompose the difference in means, \(\hat\tau\), using selection into control group instead?

\[ \begin{align*} \hat\tau &= {\mathbb{E}}[Y_i {\:\vert\:}T_i = 1]-{\mathbb{E}}[Y_i {\:\vert\:}T_i = 0] \\ &= \underbrace{{\mathbb{E}}[Y_{i} (1) - Y_{i} (0) {\:\vert\:}T_i = 0]}_{\tau_{ATC}} + \underbrace{{\mathbb{E}}[Y_{i} (1) {\:\vert\:}T_i=1]-{\mathbb{E}}[Y_{i} (1) {\:\vert\:}T_i=0]}_{\text{Selection bias wrt $Y_i(1)$}} \end{align*} \]

  • Can we decompose the difference in means, \(\hat\tau\), using \(ATE\) instead of \(ATT\) or \(ATC\)?

\[ \begin{multline} {\mathbb{E}}[Y_i {\:\vert\:}T_i = 1] - {\mathbb{E}}[Y_i {\:\vert\:}T_i = 0] = \tau_{ATE} \\ + \underbrace{{\mathbb{E}}[Y_{i}(0) {\:\vert\:}T_i = 1] - {\mathbb{E}}[Y_{i}(0) {\:\vert\:}T_i = 0]}_{\text{Selection bias wrt $Y_i(0)$}} + (1 - \pi)(\underbrace{{\mathbb{E}}[\tau_{i} {\:\vert\:}T_i = 1] - {\mathbb{E}}[\tau_{i} {\:\vert\:}T_i = 0]}_{\text{Selection bias wrt $\tau_i$}}), \\\text{where } \pi = {\textrm{Pr}}[T_i = 1]. \end{multline} \]

  • Intuition: What do these decompositions tell us about possibility of recovering \(ATT\)/\(ATC\) vs \(ATE\)?

  • Note: This could be rewritten in terms of selection bias wrt to \(Y_i(1)\)’s and \(\tau_i\) as well.

Components of Causal Inference

Identification \(\neq\) Estimation and Inference


  1. (Causal) Identification

    • With an infinite amount of data, can we learn about our causal estimand? \(\Rightarrow\) Identification is independent of the dataset size.
    • Focus on causal estimand (not the same as statistical estimand).
    • Can we express causal estimands solely in terms of observed outcomes?
    • What (if any) identification assumptions do we need for this?
  1. Estimation and Inference (standard statistics)

    • Given the finite amount of data available, how well can we learn about the statistical estimand (which equals the causal estimand under identification)?
    • This involves finding a point estimate, confidence interval, and \(p\)-value.

Causal Identification under the Potential Outcomes


Example of Identifying Assumption: Random Assignment

  • Under random assignment we have Strong Ignorability: \(T_i {\mbox{$\perp\!\!\!\perp$}}(Y_{i}(0), Y_{i}(1))\) and \(0 < {\textrm{Pr}}(T_i = 1) < 1\)

\[ {\mathbb{E}}[ Y_{i} (0) | T_i = 1] = E[Y_{i} (0) | T_i = 0] = E[Y_{i} (0)] \implies \text{no selection bias} \]

  • Why?
  • Also under random assignment:

\[ E[Y_{i} (1) | T_i = 1] - E[Y_{i} (0) | T_i = 1] = E[Y_{i} (1) - Y_{i} (0)] \quad \text{($ATT$ is the same as $ATE$)} \]

  • Why?
  • So, difference in means equals the \(ATE\) under random assignment!

Causal Identification under the Potential Outcomes

Example of Identifying Assumption (Conditional Ignorability)

  • Under Conditional Ignorability we have: \(T_i {\mbox{$\perp\!\!\!\perp$}}(Y_{i}(0), Y_{i}(1)) {\:\vert\:}X_i\) and \(\forall x \in \mathcal{X}:\: 0 < {\textrm{Pr}}(T_i = 1 {\:\vert\:}X_i = x) < 1\)
  • Conditional Average Treatment Effect is

\[ {\mathbb{E}}[ Y_{i} (0) - Y_{i} (1) | X_i = x] = \tau_{CATE} \]

  • and we can show that

\[ {\mathbb{E}}[Y_{i} | T_i = 1, X_i = x] - {\mathbb{E}}[Y_{i} | T_i = 0, X_i = x] = \tau_{CATE} (x) \]

  • How?
  • Averaging across \(\mathcal{X}\) will give us

\[ \sum_{x \in \mathcal{X}} \tau_{CATE} (x) p(x) = \tau_{ATE} \]

Principles in Causal Inference

  • Separation of Causal Estimands, Identification, and Estimation/Inference

    • Step 1: Always consider causal estimands first.
    • Step 2: Determine whether and how we can identify causal estimands.
    • Step 3: If our causal estimand is identified, consider how to estimate and infer about causal estimands.
  • Identification Strategies (Designs)

    • E.g., randomized experiment, conditional ignorability, absence of omitted variables

    • Or instrumental variables, Regression Discontinuity (RDD), Difference-in-Differences (DID)

  • Estimation Strategies

    • E.g., linear regression, logistic regression, Maximum Likelihood Estimation (MLE), and Bayesian models

Causal Diagrams

An Alternative Causal Model: Causal Graphs

  • Did social scientists do causal inference before Rubin? YES!
  • The old paradigm: structural equation modeling (SEM) and path analysis
  1. Postulate a causal mechanism and draw a corresponding path diagram.
  1. Translate it into a (typically linear) system of equations:

\[ \begin{align*} Y &= \alpha_0 + \alpha_1 Z + \alpha_2 X_1 + \alpha_2 X_3 + \epsilon_\alpha, \\ Z &= \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \beta_4 X_4 + \epsilon_\beta\\ . . .& \end{align*} \]

  1. Estimate \(\alpha, \beta\), etc., typically assuming normality and exogeneity.

causal_graph X1 Personal Well-Being (X1) Y Voter Behavior (Y) X1->Y Z Political Satisfaction (Z) X1->Z X2 Social Trust (X2) X2->Z Z->Y X3 Media Exposure (X3) X3->Y X3->Z X4 Social Networks (X4) X4->Z

  • Critique: Strong distributional/functional form assumptions and no language to distinguish causation from association

Pearl’s Attack


  • Judea Pearl (1936–) proposed a new causal inference framework based on nonparametric structural equation modeling (NPSEM)
  • Originally a computer scientist
  • Previous important work on artificial intelligence
  • Causality (Pearl 2009)
  • Won the Turing Award in 2011 for his causal work
  • Pearl’s framework builds on SEMs and revives it as a formal language of causality.

DAGs


  • A causal diagram is a directed acyclic graph (DAG) composed of:

    • Nodes (representing variables in the causal model)
    • Directed edges or arrows (representing possible causal effect)

G X X Y Y X->Y T T X->T T->Y

G X X Y Y X->Y T T X->T T->Y ex Ux ex->X et Ut et->T ey Uy ey->Y

G X X Y Y X->Y T T X->T T->Y T:se->Y:s

G X X T T X->T Y Y T->Y

  • Exogenous variables not explicitly modeled (errors) can be omitted from a graph
  • Relationships involving unobserved variables are often represented by dashed/dotted lines
  • Missing edges encode causal assumptions:

    • Missing arrows encode exclusion restrictions

    • Missing dashed arcs encode independencies between error terms

NPSEM and Treatments


A causal DAG has a one-to-one relationship with an NPSEMa:

\[ \begin{align*} X &= f_X(U_X), \\ T &= f_T(X, U_T), \\ Y &= f_Y(T, X, U_Y) \end{align*} \]

G X X Y Y X->Y T T X->T T->Y ex Ux ex->X et Ut et->T ey Uy ey->Y

  • These are structural equations (as opposed to algebraic) and represent causation – the equal signs are thus directional (i.e., no moving around)

  • Treatments (interventions) are represented by the \(do()\) operator

  • For example, \(do(x_0)\) holds \(X\) at \(x_0\) exogenously and removes the “backdoor path”:

\[ X = x_0, \quad T = f_T (x_0, U_T),\quad Y = f_Y(T, x_0, U_Y) \]

Causal Effects and Identification


  • The pre-intervention distribution: \(p(y, t, x)\)

  • The post-intervention distribution: \(p(y, x {\:\vert\:}do(t_0))\)

  • The average treatment effect of \(T\) on \(Y\), \(\tau_{ATE}\), can be defined as the average difference in \(Y\) between two intervention regimes:

\[ {\mathbb{E}}[Y {\:\vert\:}do(t_1)] - {\mathbb{E}}[Y {\:\vert\:}do(t_0)] \]

  • (Causal) Identification: Can \(P(y {\:\vert\:}do(t))\) be estimated from data governed by the pre-intervention distribution \(P(y, t, x)\)?

Useful Intuition from DAGs


  • A collider is a node in a DAG where two or more arrows meet.
  • Collider bias: Conditioning on collider can induce relationship between its parents
  • Example:

    • Admissions process where \(T\) is students’ grades, \(Y\) is motivation, both influencing \(X\), the admission decision.

    • Conditioning on \(X\) (e.g. admitted) can create a misleading relationship between \(Z\) and \(Y\)

collider_bias T Grades (T) Y Motivation (Y) X Admitted (X) T->X Y->X

Useful Intuition from DAGs


  • An M-structure forms when two variables that do not have a direct causal path share a common cause and the same effect.
  • M-bias: Conditioning on a common effect can create spurious correlation between otherwise uncorrelated variables.
  • Example:

    • In Pearl’s lung cancer study, \(T\) is smoking, \(X\) is wearing a seatbelt, \(Y\) is lung cancer. \(U_1\) is social norms following and \(U_2\) is health norms following.

    • Conditioning on \(X\) creates a false correlation between \(T\) and \(Y\), producing M-bias.

T Smoking (T) X Seatbelt (X) Y Lung cancer (Y) U1 U1 U1->T U1->X U2 U2 U2->X U2->Y

DAGs \(\leftrightarrow\) Potential Outcomes


  • A causal model represented as a graph can be translated into potential outcomes.
  • For example the following NPSEM

    \[ X = f_X(U_X), \quad T = f_T(X, U_T), \quad Y = f_Y(T, U_Y) \]

    directly corresponds to the following potential outcomes: \(X_i\), \(T_{i} (X_i)\), and \(Y_{i} (T_i)\).

  • Because of this fundamental equivalence, we will mostly work with potential outcomes, currently the standard framework in social sciences.

  • Note: Graphs are useful for expressing and visualizing a causal model in empirical research.

Potential Outcomes vs. DAGs Controversy


  • Imbens and Rubin (2015):

    Pearl’s work is interesting, and many researchers find his arguments that path diagrams are a natural and convenient way to express assumptions about causal structures appealing. In our own work, perhaps influenced by the type of examples arising in social and medical sciences, we have not found this approach to aid drawing of causal inferences.

  • Pearl’s blog post:

    So, what is it about epidemiologists that drives them to seek the light of new tools, while economists seek comfort in partial blindness, while missing out on the causal revolution? Can economists do in their heads what epidemiologists observe in their graphs? Can they, for instance, identify the testable implications of their own assumptions? Can they decide whether the IV assumptions are satisfied in their own models of reality? Of course they can’t; such decisions are intractable to the graph-less mind.

Further Readings on DAGs


References

Bareinboim, Elias, and Judea Pearl. 2016. “Causal Inference and the Data-Fusion Problem.” Proceedings of the National Academy of Sciences 113 (27): 7345–52. https://doi.org/10.1073/pnas.1510507113.
Hernán, Miguel A., and James M. Robins. 2020. Causal Inference: What If. Chapman & Hall/CRC.
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945–60.
Imai, Kosuke, Luke Keele, and Dustin Tingley. 2010. “A General Approach to Causal Mediation Analysis.” Psychological Methods 15 (4): 309–34. https://doi.org/10.1037/a0020761.
Ogburn, Elizabeth L., and Tyler J. VanderWeele. 2014. “Causal Diagrams for Interference.” Statistical Science 29 (4): 559–78. https://doi.org/10.1214/14-STS501.
Pearl, Judea. 1995a. “Causal Diagrams for Empirical Research.” Biometrika 82 (4): 669–88. https://doi.org/10.1093/biomet/82.4.669.
———. 1995b. “Causal Diagrams for Empirical Research.” Biometrika 82 (4): 669–88. https://doi.org/10.1093/biomet/82.4.669.
———. 2001. “Direct and Indirect Effects.” Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI), 411–20.
———. 2009. Causality. Cambridge University Press.
Shpitser, Ilya, and Judea Pearl. 2006. “Identification of Joint Interventional Distributions in Recursive Semi-Markovian Causal Models.” Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI), 1219–26.
Sinclair, Betsy, Margaret McConnell, and Donald P Green. 2012. “Detecting Spillover Effects: Design and Analysis of Multilevel Experiments.” American Journal of Political Science 56 (4): 1055–69.
Tian, Jin, and Judea Pearl. 2002. “A General Identification Condition for Causal Effects.” In Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI), 567–73.